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Abstract — We establish several new results on Marton's coding 
scheme and its corresponding inner bound on the capacity region 
of the general broadcast channel. We show that unlike the 
Gaussian case, Marton's coding scheme without superposition 
coding is not optimal in general even for a degraded broadcast 
channel with no common message. We then establish properties 
of Marton's inner bound that help restrict the search space for 
computing the sum-rate. Next, we show that the inner bound is 
optimal along certain directions. Finally, we propose a coding 
scheme that may lead to a larger inner bound. 

I. Introduction 

In this paper, we consider the general two-receiver broadcast 
channel with an input alphabet X, output alphabets y and 
Z, and conditional probability distribution function q{y,z\x). 
The capacity region of this channel is defined as the set of 
rate triples (Rq, Ri, R2) such that the sender X can reliably 
communicate a common message at rate Rq to both receivers 
and two private messages at rates Ri and R2 to receivers Y 
and Z respectively, see HI or Q. The capacity region of this 
channel is known for several special cases but unknown in 
general. The best known general inner bound to the capacity 
region is due to Marton f3lf6l. 

In this paper, we study Marton's inner bound. Marton's 
inner bound for a general two-receiver discrete-memoryless 
broadcast channel is as follows: 

Marton's Inner bound l^^^^: The union of non- 
negative rate triples {Rq, Ri, R2) satisfying the inequalities 

Ro + Ri<I{UW;Y), (1) 
Ro + R2<I(VW;Z), (2) 
Ro + Ri + i?2 < HUW; Y) + I{V; Z\W) 

-I{U-V\W), (3) 
Ro+Ri+R2< I{U: Y\W) + I{VW; Z) 

-I{U-V\W), (4) 
2i?o + i?i + i?2 < I{UW; Y) + /(1/VF; Z) 

-/([/; (5) 

for some random variables (?7, V, VF, X, Y, Z) ^ 
p{u,v,w,x)q{y, z\x) constitutes an inner bound to the 
capacity region. Further to compute this region it suffices to 
consider |C7|, \V\ < \X\, \W\ < \X\ + A and assume that X 
is a deterministic function of {U, V, W) [|8]. 



In this paper we prove various results related to this inner 
bound. 

Insufficiency of Marton 's coding scheme without a super- 
position variable: Random variable W corresponds to the 
"superposition-coding" aspect of the bound, and the random 
variables U and V correspond to the "Marton-coding" aspect 
of the bound. Necessity of the "superposition-coding" aspect 
of the inner bound had previously been observed for a non- 
degraded broadcast channel fTO\. For degraded channels, it is 
known that W is unnecessary for achieving the capacity region 
of Gaussian broadcast channels (through dirty paper coding) 
|14i|. We show that, unlike in the Gaussian broadcast channel 
case, "Marton's coding scheme" alone is not sufficient to 
achieve the capacity region of the general degraded broadcast 
channel. 

Computing the sum rate in Marton 's inner bound: Given a 
broadcast channel q{y, z\x) the maximum sum-rate achievable 
via Marton's strategy is given by 

max min{I{W;Y),I{W;Z)}+ 

I{U;Y\W) + I{V;Z\W) - I{U;V\W). (6) 

Further it suffices to consider \U\, \V\ < \X\, \W\ < |X| + 1 
and assume that X is a deterministic function of ([/, V, W) fE\. 
Note that min{/(M^; Y),I{W; Z)} depends only on p{w, x). 
The last three terms /([/; Y\W) + I{V; Z\W) - I{U; V\W) 
can be written as 

^p{w){l{U;Y\W = w)+I{V;Z\W = w)~I{U;V\W = w)). 

W 

Let us write the above optimization as follows: 



max 



min{I{W]Y),I{W:,Z)}+ 

Y.pM max [I{U;Y\W = w) + I{V;Z\W = w) 

^ p{u,v\'w,x) 



-I{U;V\W = w)] 

One can think of the maximization in the following way 

max mm{I{W; Y), I{W; Z)} + Vp(u;)T(p(a;|w)) 

p{w,x) 



where T{p{x)) is the maximum of I{U;Y) + I{V;Z) - 
I{U;V) over all p{u,v\x) where X ^ p{x) and X is a 
deterministic function of {U,V). 

It is shown in ifTSl that the latter maximization problem 
concerning T{p{x)) has a remarkable solution for all binary 
input broadcast channels: it suffices to take U ~ X and V — 
constant, or V — X and U — constant. In other words, for 
all binary input broadcast channels we have 

I{U; Y) + I{V; Z) - I{U; V) < max{I{X; Y),I{X; Z)}. 

(7) 

To prove this, authors of ifTSll consider different mappings from 
U xV X . Because of the cardinality bound of two on 14 and 
V, the authors argue that the XOR mapping (i.e. X = U ®V 
mod 2) and the AND mapping (i.e. X = U AV) cannot occur 
in any maximizer of I{U; Y) + I{V; Z) - I{U; V). 

We believe that finding the correct extension of equation (j?]) 
to larger alphabets can be useful in (a) computing Marton's 
inner bound efficiently for a given channel, and (b) comparing 
the Marton inner bound with its multi-letter characterizations 
to see if Marton's inner bound is optimal or not (see [11] for 
a discussion of this line of attack). 

One of the main results of this part is to generalize to larger 
alphabets the statement that the XOR mapping cannot occur 
We show that one cannot find distinct uq, Ui in U, distinct 
vq, vi in V and distinct xq, xi in X such that p{xo\uo, vq) — 
p{xq\ui,vi) = p{xi\ui,vo) = p{xi\uQ,vi) = 1. 

Optimality of Marton 's inner bound along certain direc- 
tions: We compute the maximum of Aoi?o + + A2-R2 
over all (_Ro, Ri^R^) in the capacity region where Aq, Ai and 
A2 are real numbers such that Ao > Ai + A2. We observe that 
Marton's inner bound is tight along these directions. 

An achievable region: Since capacity is defined in the limit 
of large block length, it is natural to expect that optimal coding 
schemes have an invariant structure with respect to shifts in 
time. This suggests that capacity should be expressed via a 
formula that has a fixed-point character, namely it should 
involve joint distributions that are invariant under a time shift. 
Following this general idea, we propose a new inner bound 
for the capacity region. We don't know if the proposed inner 
bound is strictly better than Marton's inner bound. 

The rest of the paper is organized as follows. Section [ll] 
contains the main results of the paper, and section [HI] contains 
the proofs of these results, with some of the details relegated 
to the appendices. 

II. Main 

Let C{q{y, z\x)) denote the capacity region of the broadcast 
channel q{y,z\x), and CM{q{y, z\x)) denote Marton's inner 
bound for the channel q{y,z\x), defined in the introduction 
by equations ([l])-(|5]l. The notation X^ is used to denote the 
vector {Xi, X2, Xi), and X^ to denote {Xi, Xi^i, X„). 

A. Insufficiency of Marton's coding scheme without a super- 
position variable 

In Marton's inner bound the auxiliary random variable W 
corresponds to the "superposition-coding" aspect of the bound. 



and the random variables U and V correspond to the "Marton- 
coding" aspect of the bound. When Rq ~ (private messages 
only) and = 0, Marton's inner bound reduces to the the 
set of non-negative rate pairs (i?i,i?2) satisfying 

Ri<IiU;Y\Q), (8) 
R2<IiV;Z\Q), (9) 
i?i + i?2 < /([/; Y\Q) + liV; Z\Q) - I{U- V\Q), (10) 

for some random variables (Q, U, V, X, Y, Z) ^ 
p{q)p{u,v,x\q)q{y,z\x). 

It is known that this inner bound is tight for Gaussian 
broadcast channels (through dirty paper coding), implying that 
W is unnecessary for achieving the capacity region of this 
class of degraded broadcast channels [14|. We show through 
an example that this is not the case in general. 

Claim 1: There are degraded broadcast channels for which 
Marton's private message inner bound without W is strictly 
contained in the capacity region of the channel (which is 
known to equal the Marton region with superposition variable 
in the case of degraded channels). 

B. Computing the sum-rate for Marton 's Inner Bound 

1 ) Extensions of the binary inequality: In this subsection 
we are concerned with the following maximization problem 
that is tightly related to the calculation of the sum rate 
for Marton's inner bound: given p{x), maximize I{U ; Y) + 
I{V; Z) — I{U ; V) over all p{u, v\x) where X is a function 
of(U,V). 

To state our main result we need the following two defini- 
tions: 

Definition 1: The input symbols xq and xi are said to be 
indistinguishable by the channel if q{y\xf)) ~ q(ii\xi) for all 
y, and q{z\xQ) — q{z\xi) for all z. A channel q{y,z\x) is 
said to be irreducible if no two of its inputs symbols are 
indistinguishable by the channel. 

Definition 2: Let U — {ui, U2, M|w|}> ^ = {"^ii •■•7 ^|V|} 
be finite sets, and ^ be a deterministic mapping from U xV io 
X. One can represent the mapping by a table having \U\ rows 
and \V\ columns; the rows are indexed by ui, U2, and 
the columns are indexed hy vi,V2, ■■■,v\x!\. In the cell {i, j), we 
write £,{ui, Vj), for the symbol x that {ui, vj) is being mapped 
to. The profile of the i*'' row is defined to be a vector of size 
\X\ counting the number of occurrences of the elements of X 
in the i*'* row. In other words if X — {xi,X2, ...jX^xi}, the 
fc*'' element of the profile of the i*'^ row is the number of times 
that Xk shows up in the i*'* row of the table. The profile of the 
j*^ column is defined similarly. Define the profile of the table 
to be a vector of size + [VDjA"! formed by concatenating 
the profile vectors of the rows and the columns of the table. 
The profile vector of the mapping ^ is denoted by v^. 

We now state the main result of this subsection. 

Theorem 1: Take an arbitrary irreducible broadcast channel 
q{y,z\x) where q{y\x) > 0,q{z\x) > for all x,y,z. 
Fix some p{x). Take any p{u,v\x) maximizing I{U;Y) + 
I{V; Z) - l\u; V) where X is a function of ([/, V). Without 



loss of generality assume that p{u) > for all u E U, 
and p{v) > for all u e V. Let x = ^(u, v) denote the 
deterministic mapping from U x V to X. Then all of the 
following conditions must hold: 

• p{u,v) > 0, p{u,y) > 0, and p{v,z) > for all u,v,y 
and z. 

• The profile vector of the mapping ^, v^, cannot be written 
as 

M 

where (for t = 1,2,3, M) are deterministic map- 
pings from U X V to X not equal to ^, and at are non- 
negative numbers adding up to one, i.e. J2t=i '^t = 1- 

• Let the functions 

/„ : A" — !> M for every u eU, 
: A" — >■ E for every v E V, 
and h: X ^R, 



be defined by 



fu{x) = Y,y qiy\x) logp{u, y). 



h{x) — min ( \og{p{u' ,v')) 

u' £U ,v' £V 



-fu'ix) - gv'{.x) 



These definitions make sense because of the first bullet 
of this theorem. Then, for any u and v, the following two 
equations hold: 

\og{p{u,v)) = Taayi.^[fu{x) + g^{x) + h{x)]. 



and 



p{xq\u,v) = 1 for some Xq E X ^ 
a;o G argmaxx[fu{x) + gv{x) + h{x)]. 

Discussion 1: These constraints imply restrictions on the 
maximizers. The second bullet implies that one cannot find 
distinct uq, ui in U, distinct vq, vi in V and distinct xq, xi 
in X such that p(xq\uq, vq) ~ p{xq\ui, vi) — p{xi\ui, vq) = 
p{xi\uo,vi) = iQ 

Next, assume that all we know about the mapping pattern 
is that xq = ^(""0, "^o) = ^{ui,vi) for some xq- Then the third 

'Let the mapping be equal to ^ except that {uo,vo) and (ui,vi) are 
mapped to xi (instead of xq), and (ui, vq) and {uq, vi) are mapped to xo 
(instead of xi). Figure^illustrates this. The mapping has the same profile 
vector as ^. Thus we can write the original profile as a convex combination 
of other profiles (the condition in the displayed equation of the second bullet 
is violated for the choice of AI = I, and ai = 1). Thus the second 
bullet implies that it cannot happen. Similarly the mapping shown in Figure 
|2] cannot occur because there is another mapping with the same profile. 
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Fig. 1. If we have a mapping with the XOR structure, we can get another 
mapping with the same profile by switching xg and xi of four cells of the 
mappings. 
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Fig. 2. Another mapping that cannot occur because one can find another 
mapping with the same profile. 



bullet implies that p{uo,vo)p{ui,vi) < p{ui,vq)p{uq,vi). 
This holds since 

logp{uo,vo) + \ogp{ui,vi) = 

fuoixo) + gvoixo) + h{xo) + 

ful (xo) + gvi (xo) + h{xo) = 

fuo {xo) + gvi {xq) + h{xo) + 

/ui (xo) + gvo (xo) + h{xo) < 
maxa; fu„ (x) + gy^ {x) + h{x) + 
maxa; /„j (x) + gvo{x) + h{x) = 
logp(wo,wi) + \ogp{ui,vo). 

2) Sum rate evaluation: In this subsection we turn to 
evaluation of the whole sum-rate expression of Marton's 
inner bound (including the W terms). We need the following 
definition: 

For any A £ [0,1], let 



Ta = Ya.&y^{u,v,w,x) (A/(VF; Y) 
I{U;Y\W)+I{V;Z\W) 



{l~\)I{W;Z) 
I{U;V\W)). 



Computing the sum-rate for Marton's inner bound is closely 
related to the above maximization problem for A E [0, 1]: 

Claim 2: The maximum of the sum-rate for Marton's inner 
bound is equal to miring [0.1] Tx. 

Since the original submission of the conference version of this 
paper, some interesting properties of Tx such as its convexity 
in A, and its connection to the outer bound and its factorization 
have been investigated in ifTTI and lfT2l . An alternative proof 
of Claim |2] using a theorem by Terkelsen is also reported in 



The main theorem of this section restricts the search space 
for computing T\ . In this section, we only deal with broadcast 
channels q{y, z\x) with strictly positive transition matrices, 
i.e. when q{y\x) > 0,q{z\x) > for all x,y,z. In order to 
evaluate T\ when q{y\x) or q{z\x) become zero for some y or 
z, one can use the continuity of Tx in q{y, z\x) and take the 
limit of Tx for a sequence of channels with positive entries 
converging to the desired channel. The reason for dealing with 
this class of broadcast channels should become clear by the 
following lemma which is a corollary to the first bullet of 
Theorem [l] 

Lemma 1: Take an arbitrary broadcast channel q{y, z\x) 
with strictly positive transition matrices (i.e. q{y\x) > 
0, q{z\x) > for all x, y, z). Let p{u, v, w, x) be an arbitrary 
joint distribution maximizing Tx for some A e [0, 1] where 
H{X\U,V,W) = 0. If p{u,w) and p{v,w) are positive for 
some triple (u, v, w), then it must be the case that p{u, v, w) > 
0, p{u, w,y) > and p{v, w, z) > for all y and z. 

Theorem 2: Take an arbitrary irreducible broadcast channel 
q{y, z\x) with strictly positive transition matrices. In comput- 
ing Tx for some A e [0, 1], it suffices to take the maximum 
over auxiliary random variables p{u^v,'w,x)q{y^z\x) simul- 
taneously satisfying the following constraints: 

. \U\ < mindA-l, 13^1), |V| < mindA"!, |Z|), |W| < \X\. 

• H{X\UVW) — 0. Given w where p{w) > 0, we use x — 

(u, v) to denote the deterministic mapping from Uw x 
Vw to X. Here U^^ is the set ofu^U such that p{u\w) > 
and is the set of v E V such that p{v\w) > 0. 

• For arbitrary w such that p{w) > 0, the profile vector of 
the mapping v^{J), cannot be written as 

M 

t=i 

where (for t = 1,2,3, ...,M) are deterministic map- 
pings from Uw X Vto to A" not equal to ^("'), and at are 
non-negative numbers adding up to one, i.e. X^tHi '^t = 
1. 

• For arbitrary w such that p(w) > 0, let the functions 

fu,w : A" — > M for every u ElAw, 
g^^yj : A" — >• M for every w e V^,, 
and /lu, : A — > M, 

be defined by 

fu,w{^) = '7(y|a;)log?3(-uy|'Uj), 
9v,w{x) = Y.Z I^Ax) ^ogp{vz\w), 

hw{x) — min \og{p{u' v' \w)) 

u'EU^.,v'EV„ y 

These definitions make sense because of Lemma [T] Then, 
for any u e U^^ and v E V^, the following two equations 
hold: 

log{p{uv\w)) = ma:>ix[fu,w{x) + gv,w{x) + h^{x)], 



and 

p{xo\u,v,w) — 1 for some xq £ X ^ 
Xq g argmaxx[fu,w{x) + gv^ix) + hw{x)]. 

• Given any w, random variables C/^, V^, X^, Y^, dis- 
tributed according to u, a;, y, z\'w) satisfy the follow- 
ing: 

J(C7; y„) > /((7; K,Z„) for any [7 ^ t/„ ^ V^X^Y^Z-u, 
I{V; ZJ, > I{V; UM for any F ^ K, ^ U^X^Y^Z^ 

Discussion 2: The first constraint imposes cardinality 
bounds on \L{\ and |V| that are better than those reported in |8|. 
However, we only claim the improved cardinality bounds for 
Tx and not the whole capacity region. The second constraint 
is not new, and can be found in ||8]. The other constraints are 
useful in restricting the search space due to the constraints 
imposed on p(m, w, a;). For instance, the third and fourth 
bullet restrict the set of possible mappings, as discussed in the 
previous subsection. 

C. Optimality along certain directions 

In order to state the main result of this section we need the 
following definition: 

Definition 3: |7| Let Cd-^{q{y, z\x)) and (q(j/, zjx)) de- 
note the degraded message set capacity regions, i.e. when 
Ri — and R2 — 0, respectively. The capacity region 
Cdi {q{y, z\x)) is the set of of non-negative rate pairs (i?o, R2) 
satisfying 

Ro<I{W;Y), 
R2<I{X-Z\W), 
Ro + R2<I{X;Z), 

for some random variables {W, X,Y, Z) ^ p{w, x)q{y, z\x). 

The capacity region Cd2{q{y, z\x)) is defined similarly. 
We now state the main result of this section: 
Theorem 3: For a broadcast channel q{y,z\x) and real 

numbers Aq, Ai and A2 such that Aq > Ai + A2, 

max (Aoi?o + + A2-R2) = 

iB^o,RuR2)eC{qiv,z\x)) 

max{ max (Aoi?o + A2i?2), 

{Ro,R2)eCai(qiy,z\x)) 

max (Ao-Rq + Aii?i)}, 

(flo,fll)eCd2 iq(v,z\x)) 

where Cdi{q{y, z\x)) and Cd2{q{y, z\x)) are the degraded 
message set capacity regions for the given channel. 

Corollary 1: The above observation essentially says that if 
Ao > Ai + A2, then a maximum of Aoi?o + Aii?i + A2-R2 over 
triples {Rq, Ri, R2) in the capacity region occurs when either 
i?i = or i?2 = 0. 

Remark 1: Since Cd^iqiy, z\x)) U Cd^iqiy, z\x)) C 
Cuiqiy, z\x)) C C{q{y, z\x)), the above lemma implies that 
Marton's inner bound is tight along the direction of such 



(Ao, Ai,A2), i.e. 

max (Aoi?o + Aii?i + A2i?2) = 

{Ro..Ri..R2)eC{q{y.z\x)) 

max (Aoi?o + Aii?i + X2R2), 

{Ro,Ri,R2)&CM{q{y,z\xy) 

whenever Aq > Ai + A2. 

D. An achievable region 

Since capacity is defined in the limit of large block length, 
it is natural to expect that optimal coding schemes have 
an invariant structure with respect to shifts in time. This 
suggests that capacity should be expressed via a formula 
that has a fixed-point character, namely it should involve 
joint distributions that are invariant under a time shift. The 
following theorem is a proposed inner bound along these lines. 

Theorem 4: For a broadcast channel q{y, z\x), consider 
two i.i.d. copies [\J\,V\^W\) and (?72,^2,W^2) and a 
conditional pmf r (x | ui , wi , wi , U2 , W2 , ^2 ) . Assume that 
f/i, Vi, W^i, t/2, Vi, 1^2, ^1, ^2, 1^2, ^1, ^2 are distributed 
according to 

p(iti, t;i,it;i,M2,W2,W2, 2^1, 2/1, 2:1, a;2, 2/2,^2) = 

r(x2\ux,Vx,'Wi,U2,V2,W2)q{V2,Z2\x2)- 

f(a;i|ui,i;i, wi)q(2/i,zi|xi), 
where r[x\u^v,'w) is defined as 

r(a;|u', u', w', m, u, w)r{v! ^ v\ w'). 

u'eu,v'ev.w'ew 
Then a rate triple {Rq, Ri, R2) is achievable if 

i?Oj Ri 1 R2 ^ 0, 
Ro + Ri< I{U2W2;YiY2UiWi), 

Ro+R2< I{V2W2\ Z1Z2V1W1), 

Ro + Ri + R2< I{V2; ZiZ2ViWi\W2) 

+ I{U2W2;YiY2UiWi) - I{U2]V2\W2), 
Ro + Ri+R2< I{U2;YiY2UiWi\W2) 

+ 7(^2 Z1Z2V1W1) - IiU2; V2\W2), 
2Ro + Ri+R2< I{U2W2] Y1Y2U1W1) 

+ IiV2W2; Z1Z2V1W1) - IiU2;V2\W2), 

for some Ui,Vi, W\ , C/2 , V2 , W2 ,Xi,X2 that satisfy the above 
conditions. 

Remark 2: The above inner bound reduces to 
Marton's inner bound if the conditional distribution 

r{x\ui,vi,wi,U2,V2,W2) = r{x\u2,V2,W2), i.e. 
UiViWi U2V2W2 ^ X form a Markov chain. 

III. Proofs 

Proof of Claim [7J Consider the degraded broadcast 
channel p{y,z\x) = p{y\x)p{z\y), where the channel from 
X to y is a BSC(0.3) and the channel from y to Z is as 
follows: pz\y{0\0) = 0.6, Pz|y(1|0) = 0.4, Pz|y(0|l) = 0, 
— 1- We show that the private message capacity 



region for this channel is strictly larger than Marton's inner 
bound without W. 

We first intuitively sketch outline of the proof: take a non- 
negative real a and consider the maximum of i?i + ai?2 
over the pairs {Ri,R2) in the capacity region. Since the 
broadcast channel is degraded, the maximum is equal to 
TJia^v-^x^YZ I{X; Y\V) + aI{V; Z). Since X ^Y ^ Z, 
when the weight of the degraded receiver is less than or equal 
to 1, an optimum V will be equal to a constant (corresponding 
to i?2 = 0). As we gradually increase a beyond one, the 
optimum V gradually moves from a constant random variable 
to X (corresponding to Ri = 0). Now, let us consider the 
maximum of i?i + aR2 over the pairs {Ri,R2) in Marton's 
inner bound without the auxiliary random variable W. The 
latter maximum is equal to I{U;Y) + aI{V;Z) ~ I{U;V). 
When a < 1, it is optimum to take U — X, V ^constant 
and dedicate all the rate to the stronger receiver. Simulation 
results however indicate that as we increase a beyond one in 
the problem of maximizing /([/; Y) + aI{V: Z) - I{U: V), 
U — X, V =constant continues to be optimal up to a thresh- 
old. Beyond this threshold, suddenly U =constant, V — X 
becomes the optimizing choice, and stays as the optimizing 
choice afterwards. In other words, unlike the gradual transition 
of the maximizing V for the actual region, there is a sharp 
transition in the maximizing V for Marton's inner bound 
without W. 

In the following, we provide a more detailed proof: the 
maximum of i?i + 2.4i?2 over pairs {Ri, R2) in the capacity 
region, is equal to Taa.yiv^x^YZ I{X]Y\V) + 2AI{V; Z). 
Take the joint pmf of p(v, x) to be as follows: P{V = 0,X = 
0) = 0, P{V = 0,X = 1) = 0.41, P{V ^ 1,X ^ 
0) = 0.48, P{V = 1,X = 1) = 0.11. For this choice 
of piv,x), I{X;Y\V) + 2AI{V-Z) = 0.1229.... Therefore 
the maximum of Ri + 2.4i?2 > 0.1229.... The maximum of 
Ri +2.4i?2 over Marton's inner bound without W is equal to 
sn]yuv^x^YzI{U-V) + 2AI{V-Z) - I{U-V). Using the 
perturbation method of JS], one can bound the cardinality 
of U and V from above by \X\, and further assume that 
X is a deterministic function of {U,V). This makes the 
domain compact, implying that the above supremum is indeed 
a maximum. 

Since X is a binary random variable, we need to search 
over binary random variables U, V . Numerical simulations 
show that the maximum is equal to 0.1215... < 0.1229... and 
occurs when X = V and U = constant. Therefore Marton's 
inner bound without W is not tight for this broadcast channel. 

■ 

Proof of Claim ^ In order to prove the observation, one 
needs to argue that the following exchange of max and min is 
legitimate: 

maxp(„^^,^^,j,) minAe[o,i] A/(T4^; F) + (1 - X)I{W; Z) + 

I{U; Y\W) + I{V; Z\W) ~ I{U; V\W) = 
minAe[o,i] ma.Xp(^u,v,w,x) XI{W; F) + (1 - X)I{W; Z) + 
/([/; Y\W) + I{V; Z\W) - I{U; V\W). 



Let RMarton-Sum denote the sum-rate for Marton's inner 
bound. We would like to show that RMarton~Sum is equal to 
mino<A<i Tx- 

Let V be the union over all p{u,v,w,x) of real pairs 
{di,d2) satisfying 

di < I{W; Y) + I{U; Y\W) + I{V; Z\W) - I{U; V\W), 
d2 < HW; Z) + /([/; Y\W) + I{V; Z\W) - I{U; V\W). 

We claim that this region is convex. Take two points 
{di,d2) and {d[,d2) in the region. Corresponding to 
these are joint distributions p{ui,vi,wi,xi)q{yi, zi\xi) and 
p{u2,V2,'W2, X2)q{y2, Z2\x2)- Take a uniform binary random 
variable Q independent of all the previously defined random 
variables. Set U ^ Uq, V ^ Vq, W ^ (Q, Wq), X = Xq, 
Y ^Yq, Z ^ Zq.Wq will then have 

I{W; Y) + I{U; Y\W) + I{V] Z\W) - I{U; V\W) = 
IiWQ,Q;YQ) + I{Uq;Yq\Wq,Q) + 
I{VQ;ZQ\WQ,Q)~IiUQ;VQ\WQ,Q)> 

HWqiYqIQ) + I{Uq;Yq\Wq,Q) + 
IiVQ;ZQ\WQ,Q) - IiUQ;VQ\WQ,Q) = 
^{liWi;Yi) + I{Ui;Yi\Wi) + 
I{Vi;Z,\W,)-I{Ui;Vi\W,)) + 
HW;>2) + /(C/2;i"2|VF2) + 
I{V2;Z2\W2)-IiU2;V2\W2)) > 

Similarly, 

I{W; Z) + /([/; Y\W) + I{V; Z\W) - I{U; V\W) > 

i(d2+4)- 

Thus, the point {\{di +d\), 1(^2 + ^2)) is in the region. Thus, 
T) is convex. 

Next, note that the point {RMarto7i-Su'm,RMarton-Suni) 

is in v. We claim that it is a boundary point of V. If 
it is an interior point, there must exist e > such that 

{RMarton~-Su,n + £, RMarton-Sum + e) IS in V. This impHcS 

the existence of some p{u,v,w,x) where 

RMarton-Sum + £ < 

I{W- Y) + I{U- Y\W) + I{V- Z\W) - I{U; V\W), 

-^AI art on — Sum ~h 6 ^ 

liW; Z) + I{U; Y\W) + I{V; Z\W) - I{U; V\W). 
This implies that 

RAIarton — Sum 

min(/(VK; Y),I{W; Z)) + /([/; Y\W) + 
I{V]Z\W) - I{U]V\W) 

for some u, w, x), which is a contradiction. 

Using the supporting hyperplane theorem and the fact 
that V is convex and closed, one can conclude that there 
exists a supporting hyperplane to V at the boundary point 



[RMarton-Sum, RMarton-Sum)- We claim that this Support- 
ing hyperplane must have the equation \* di + [1 \*)d2 — 
r(A*) for some A* G [0,1]. The proof is as follows: any 
supporting hyperplane has the formula \*di + (1 — A*)(i2 k 
for some real A* and real k. We claim that A* must be in 
[0,1] and k = T{X*). Assume that for instance A* < 0. 
We know that V must be entirely contained in one of the 
two closed half-spaces determined by the hyperplane. Note 
that the points (0, 0), (— oo, 0) and (0, — oo) are in V (take 
p{u,v,'w,x) satisfying I{U;V\W) = in the definition of 
V). The value of \*di + (1 — X*)d2 at these points is equal 
to 0, +0O and — oo respectively. Thus, V cannot possibly 
be entirely contained in one of the two closed half-spaces 
determined by the hyperplane. Similarly the case 1 — A* < 
can be refuted. Therefore A* must be in [0, 1]. Since the points 
(— oo, 0) and (0, —oo) are in V, the half-space determined 
by the hyperplane that contains T) is the one determined by 
the equation X*di + (1 — X*)d2 < k for some k. Since 
the half-space has at least one point of V, the value of 
k must be equal to max((ij ^^jeK + (1 ^ X*)d2- The 
latter is equal to r(A*). Thus, the supporting hyperplane at 
the boundary point [RMarton-Sum, RMarto,i-Su,n) has the 
equation X*di + (1 - X*)d2 = T(A*) for some A* e [0, 1]. 

Since {RMarton-Sum, RMarton-Sum) Hes on this hyper- 
plane, X* RMarton-Sum + (1 " X*) RMarton-Sum = T{X*) 

implies that RMarton-Sum = T{X*) for some A* G [0,1]. 
Therefore 

min T\ < RMarton-Sum- 
0<A<1 

On the other hand, for every A, Tx > RMarton-Sum- There- 
fore 

min Tx > RMarton-Sum- 
0<\<1 

m 

Proof of Theorem^ We begin by proving the first bullet. 

p{u, y) is positive for all u, y because there must exist some 
X such that p{u, x) > 0. Since the transition matrices have 
positive entries and p{u,y) > p{u,x)q{y\x), p{u,y) will be 
positive for all y. A similar argument proves that p{v, z) > 
for all V, z. Next assume that w) = for some (u, v). Take 
some v! , v' such that p{u' , v') > 0. Let us reduce p{u' , v') by 
e and increase p{u, v) by e. Furthermore, have (u, v) mapped 
to the same x that {u',v') was mapped to; this ensures that 
the marginal distribution of X is preserved. One can write 

IiU;Y) + I{V;Z)~I{U;V) = 
H(Y) + H{Z) + H{UV) - H{UY) - H{VZ). 

The only change in this expression comes from the change 
in H{UV) - H{UY) - H{VZ). The derivative of H{UV) 
with respect to e, at e = 0, will be infinity. But the derivative 
of H{UY) and H{VZ) will be finite since p{u,y), p{u',y), 
p{v,z) and p{v',z) are positive for all y and z. So, the first 
derivative of H{UV)-H{UY)-H{VZ) with respect to e, at 
e = 0, will be positive. This is a contradiction since 
was assumed to maximize I{U; Y) + I{V\ Z) — I{U; V). 



We now prove the second bullet. Assume that U — 
{mi,U2, and V {wi, z)2, ■y|v|}- Let iii.j = 

p{ui,Vj) for 1 = 1, \U\, j — 1, |V|. From the first bullet 
we know that > for all i and j. Let e — min^ j TTij. 
Take some e e (0, e). Let x — £,o{u, v) denote the deterministic 
mapping from U x V to X. 

We prove the statement by contradiction. Assume that 

M 
t=l 

for some mappings (t — 1,2, ..,M) distinct from i^o and 
non-negative numbers at adding up to one. 

Let random variables Tij (for i — 1,...,\U\, j = 
1, 2, 3, |V|) be M + 1-ary random variables mutually in- 
dependent of each other, and of U, V, X, Y, Z satisfying: 





= 0) = 


I - —, 




-1) = 


—ai. 


P(.Ti,j 


= 2) = 




P(.Ti,j 


= 3) = 




Pi^iJ 


= M) 





Let X be defined as follows: 



• On the event that {U,V) = {ui,Vj), let X be equal to 
^Ti j{ui,Vj). In other words, if Tij — 0, X is equal to 
^o{ui,Vj); if Ti_j = 1, X is equal to ^i{ui,Vj), etc. 

We claim that p{X = x\U = u^) = p{X = = u,) 
for all i = 1,2, 3, |Z^| and x; and similarly p{X — x\V — 
Vj) = p{X = x\V = Vj) for all j = 1, 2, 3, | V| and x. 
This is proved in Appendix |B] Note that the above property 
implies that X and X have the same marginal distributions. 

Let Y and Z be defined such that UV{Ti j)i.i 2,...j=i,2,.. — >■ 
X — > yZ, and the conditional law of y and z given a; is 
the same as q{y,z\x). Here (Tij)i:i^2...j"=i.2,.. denotes the 
collection of all Ti j for all i and j. 

Without loss of generality, let us assume ai ^ 0. Since the 
mapping ^o('i •) is not equal to •), there must exist 
such that ^o(ui, 'f^j) 7^ ^^j)- Let us label the input symbol 

£,o{ui,Vj) by xq, and the input symbol £^i{ui,Vj) by xi. We 
know that the channel is irreducible. Let us then assume that 
there is some y such that q{y\xQ) ^ q{y\xi); the proof for 
the case when there is some z such that q{z\xQ) ^ q{z\xi) is 
similar. Let^L^ = {U,T,^j) and V ^V. 

Since p{X = x\U = u) = p{X = x\U = u) for all u and 
X, and p{X — x\V — v) = p{X — x\V — v) for all v and x, 
we have 

. I{U;Y) = IiU;Y), 

. IiV;Z)^nV;Z). 

Therefore I{V; Z) = I{V- Z) and I{U; Y) = I{U; Y) + 
I{Tij;Y\U). Furthermore since Tij is independent of U,V, 
we have I{U; V) = I{U; V). Therefore 

10; Y) + I{V; Z) - I{U; V) - {I{U; Y) + I{V; Z) - I{U; V)) 
= I{T,r,Y\U). 



Since p{u,v,x) was maximizing I{U;Y) + I{V;Z) — 
I{U; V) under the fixed marginal distribution on x, we must 
have l[T,^f,Y\U) = 0. Therefore I{Tij;Y\U = m) = Q 
holds as well. 

In Appendix |C] we prove that the following are true 

p{X = xn\U = Ui,Ti^j = 0) 7^ p{X = xo|?7 = Ui,Ti,j = 1), 

p{X = xi\U = Ui,Ti^j = 0) 7^ p{X = xi\U = Ui,Ti^j = 1). 

But for any x ^ {xq,Xi}, 

p{X = a;|?7 = u„T,^, = 0) = p{X = x\U = u,,T,^, = 1). 

Remember that we assumed that there is some y such that 
qiy\xo) 7^ qiy\xi). In Appendix [P] we show that 

p(f = y|C/ = = 0)^p{Y = y\U = u„T,^, = 1). 

This implies that Y and j are not conditionally independent 
given U = Ui. Therefore I{Ti,j;Y\U ~ Ui) ^ which is a 
contradiction. 

We now prove the third bullet. The proof begins by noting 
that the definition of h{x) implies that for any {u,v,x), 

h{x) < \og{p{u,v)) - fu{x) - gv{x). 

Therefore, for any (u, v, x), 

\og{p{u, v)) > fu{x) + gy{x) + h{x). 

Thus, 

\og{p{u,v)) > maxj; + gv{x) + h{x)). (11) 

Note that the first partial derivative of H{UV) - H{UY) - 
H{VZ) with respect to p{u,v,x) is proportional to 

- iogp(u, v)-i + J2y qiy\x) iogp(u, y) + 1 + 

J2zli^\^) ^ogp{v,z) + 1 = 
- logp(u, v) + fu{x) + gy{x) + 1. 

Assume that the triple {u,v,x) is such that p{u,v,x) > 0. 
Take some arbitrary u' and v'. Reducing p{u, v,x) by a small 
e and increasing p{u' , v' , x) by e does not affect the marginal 
distribution of X and hence should not increase the expression 
H{UV) - H{UY) - H{VZ). Therefore the first derivative of 
H{UV) - H{UY) - H(yZ) with respect to p{u,v,x) must 
be greater than or equal to the first derivative of H{UV) — 
H{UY) - H{VZ) with respect to p{u' , v' , x). Thus, 

- logp(u, v) + fu{x) + gv{x)^l> 
~\ogp{u',v') + fu'{x) +gv'{x) + 1. 

In other words, for any arbitrary u' and v' , we have 

\ogp(u,v) - fu{x) ~ gy{x) < 
logp{u',v') - fu'{x) - gv'{x). 

Therefore 

logp{u,v) - fu{x) - g^{x) < 
min„/,„' logp(u', v') - /„/ (x) - g^' (x) = h{x). 



Thus, \ogp{u,v) < fu{x) + gv{x) + h{x) whenever We show that 
p{u, u, a;) > 0. This together with equation ( [IT] ) imply that 



and 



log{p{u,v)) 



■ fu{x) + gv{x) + h{x), 



p{xq\u, v) ~ 1 for some xq E X ^ 
xo e argmax^fuix) + g^{x) + h{x). 



—H{Mo) + —H{Mi) + ^H{M2) - 0(e) < 
n n n 

max( max Aoi?o + -^2^2, 

{Ro^R2)eCa^{q{y,z\x)) 

max Aoi?o + 

(Ro,B.i)eCd2{q{y,z\^)) 



Proof of Lemma [7} This is a consequence of bullet one 
of Theorem [T] ■ 

Proof of Theorem ^ From the set of pmfs p{u, v,w,x) 
that maximize the expression XI{W; F) + (1 - X)I{W; Z) + 
I{U; Y\W) + I{V; Z\W) - I{U; V\W), let pq{u, v, w, x) be 
the one that achieves the largest value of I{W; Y)+I{W; Z). 
In Appendix [a] we prove that one can find p{u, v, w, x) such 
that 

. XI{W; Y) + {1-X)I{W; Z)+I{U; Y\W)+I{V; Z\W)- 
I{U; V\W) is equal to XI(W; ?) + (!- X)l(W; Z) + 
I{U; Y\W) + I{V; Z\W) - I{U; VjW), 

. I{W; Y) + I{W; Z) is equal to I{W; Y) + I{W; Z), 

. \U\<mm{\Xl\y\), 

. |VX<min(|A'|,|Z|), 

• |W| < 

. H{X\UVW) = 0. 

Thus the constraints in the first and second bullets are satisfied 
by p{u,v,id,x). The second and third bullets of Theorem [T| 
imply that p(u, v, w, x) will automatically satisfy the third and 
fourth bullet of Theorem |2l In Appendix [Ej we show that 
the fifth bullet of Theorernp] holds for any joint distribution 
that maximizes the expression XI{W; Y) + {1 — X)I{W; Z) + 
/([/; Y\W) + I{V\ Z\W) - I{U; V\W), and at the same time 
has the largest possible value of I{W; Y) + I{W; Z). Thus it 
must also hold for p{u, v, w,x). ■ 
Proof of Theorem [if It suffices to show that 

max (Aoi?o + Aii?i + X2R2) < 

(flo:fli,-R2)eC(g(y,z|a;)) 

max{ max (Aq-Rq + A2i?2), 

(R.oM.2)eCi^{q(y,z\x)) 

max (Aoi?o + Aii?i)}. 

{RoM.i)<^Cd^{q(y,z\x)) 

The key step is to show that if (i?o, R2) is in the capacity 
region of a broadcast channel, then (i?o + min{i?i, i?2}, ^1 — 
min{_Ri. i?2}, ^2 — min{_Ri, i?2}) is also in the capac- 
ity region. Since Aq > Ai + A2, we then have that 
Ao(-Ro + min{i?i, i?2}) + Ai(i?i - min{i?i, i?2}) + A2(i?2 - 
min{i?i, i?2}) > Aoi?o + Aii?i + X2R2, so at the maximum 
we must have min(i?i, R2) — 0. One can prove this property 
using the result of Willems |9j, which shows that the maximal 
probability of error capacity region is equal to the average 
probability of error capacity region. Willems 's proof of his re- 
sult, however, is rather involved. Instead, we provide a simple 
direct proof. Consider an arbitrary code {Mq, Mi, M2, X", e). 



where 0{e) denotes a constant (depending only on \X\, \y\, 
\Z\) times e. 

Assume without loss of generality that H{M2) < H{Mi), 
i.e. i?2 < Ri- Let W = Mf^Kh, X = X", F = y", Z = Z". 
Note that q{y,z\x) is the n-fold version of q{y,z\x). Let us 
look at {q{y, evaluated at the joint pmf p{w, x): 

5 <I(W;Z), 
Ri <I{X;Y\W), 
Ro + Ri <IiX;Y). 

Note that, by Fano's inequality, 

liW; Z) = /(A/0M2; - H{Mo) + H{M2) - 0(ne), 
I{X:Y\W) ^ /(X";y"|MoM2) = H{lvh) - 0(ne), 
I{X- Y) = /(X"; F") > H{Mo) + H{Mi) - 0(ne). 



Therefore Rq = H{Mo) + H{M2) - 0{ne) = n{Ro + R2) - 
0{ne) and R[ = H{Mi) ~ H{M2) = n{Ri - R2) - 0{ne) 
is in ((/(y, zjx)). Since q{y,^\x) is the 7i-fold version 
of q{y,z\x) and C^^ z|x)) is the degraded message set 
capacity region for q{y, ^x), we must have: Cd^ {q{y, z\x)) = 
n ■ Cdi{q{yT z\x)), where the multiplication here is pointwise. 
Thus, ^) g Cdi{q{y,z\x)). We can complete the proof 
by letting e — > 0, and conclude that (i?o + i?2 , ^1 — i?2 , 0) G 
Cdiiqiu, z\x)), and thus also in the capacity region. ■ 

Proof of Theorem [?]• Consider a natural number n, and 
define the super symbols X — XiX2.-.Xn, Y — YiY2...Yn, 
Z = ZiZ2.--Zn representing n-inputs and rt -outputs of the 
product broadcast channel 



q'^{yiy2---yn,ZlZ2--Zn\xiX2...Xn) = Y\_q{yi, Zi\Xi). 



Since the capacity region of the product channel q"-{y,z\x) 
is n times the capacity region of q{y, z\x), we have 

^^CM{q"'{yiy2—yn,ZiZ2—Zn\xiX2.:Xn)) C C{q{y,z\x)). 

Given an arbitrary joint pmf w", x"), one can 

then show that the following region is an inner bound to 



C{q{y,z\x)): 

Ro,Ri,R2 > 0, 

Ro + Ri< -I{U"W"; Y"), (12) 
n 

Ro + R2 < -I{V"W";Z"), (13) 
n 

Ro + Ri+R2<- \I{U"W"; Y") + J(V"; Z"|W^") 
n 

-/([/"; T/"|VK")], (14) 

Ro + Ri+R2< -\I{U";Y''\W") + I{V''W";Z'') 
n 

- I{U";V"\W")], (15) 

2Ro + Ri+R2<- \I{U"W"; Y") + I{V"W"; Z") 
n 

-/((7";T/"iW")], (16) 

where [/", F", VF", X", F", Z" are distributed according to 
w", w", a:")q(2/", Clearly if we assume that 

{U^,V",W^,X^) is n i.i.d. copies of p{u,v,w,x) we get 
back the one-letter version of Marlon's inner bound. 
Assume that 

n 

Note that Ui,Vi,Wi are i.i.d. copies of {U,V,W) distributed 
according to r{u,v,'w). We further use the given conditional 
law r{x\ui,vi,wi,U2,V2,W2) to define the joint distribution 
of X" given U",V",W" as 

n 

p(x2|u",w", w") = ]^r(a;i|Mi_i,i;i_i,w;i_i,Ui,f;i,Wi), 

Xi = constant. 
We then have 

1=1 

= {n - l)I{U2W2;YiY2UiWi). 

Similarly > (71 - 1)7(^21^2; V1VF1Z1Z2). 

Next, note that 

I{V"; Z"\W") = i7(F"|VK") - i7(F"|VK"Z") = 

n 

1=1 

n 

i=l 
n 

j=2 

(n- 1)7(^2; 14 l^i^2^i|W^2). 



Similarly, /(C/"; r"|W^") > (n - l)/(C/2; Yir2C/iW'i|M/2)- 
Lastly, note that /([/"; F"|Ty") = n • /(V; V\W). We obtain 
the desired result by substituting these values into equations 
([T2])-([T6]|, and letting n-^ 00. ■ 
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Appendix A 

Suppose po{u, V, w, x) is a joint distribution that maximizes 
XI{W; y) + (1 - X)IiW: Z) + I{U: Y\W) + I{V; Z\W) - 
I{U\V\W), and among all such joint distributions has the 
largest value of I{W\Y) + I{W;Z). In this appendix, we 
prove that one can find p{u, v, {?, x) such that 

. XI(W; Y) + {l-X)I{W; Z)+I{U; Y\W)+I(y; Z\W)- 
I{U; V\W) is equal toA/(t?;y) + (1 - X)l(W; Z) + 
I{U; Y\W) + I{V; Z\W) - I{U: VjW), 

. I{W; Y) + I{W; Z) is equal to I{W:, Y) + I{W; Z), 

. <min(|A'|,|3^|), 



. |V| <min(|A'|,|Z|), 

• m<\n 

. H{X\UVW) = 0. 
We begin by reducing the cardinahty of W . Assume that 
\W\ > \X\ and p{w) ^ for all w. There must therefore 
exists a function L : W — > M where 

E[L{W)\X] = 0, 

3w:p{w)^0, L{w)^0. 

Let us perturb pq{u, v, w, x) along L as follows: 

p^{u, V, w, X, y, z) = po{u, V, w, x, y, z) ■ [1 + eL{w)], 

where e is a real number in some interval [—£1,62] for some 
positive reals Ii and 62- 

Consider the expression XI{W; Y) + {1 - X)I{W; Z) + 
I{U- Y\W) + I{V; Z\W) - I{U; V\W) at p,{u, v, w, x, y, z). 
It can be verified that the expression is a linear function of 
e under this perturbation. Since a maximum of this expres- 
sion occurs at e = 0, which is a point strictly inside the 
interval [—£1,62], it must be the case that this expression 
is a constant function of e. Next consider the expression 
I[W] Y) + I{W; Z) at Pe{u, v, w, x, y, z). It can be verified 
that the expression is a linear function of e under this per- 
turbation. Note that po{u,v,w,x) is a joint distribution that 
has the largest value of I{W; Y) + I{W; Z) among all joint 
distributions that maximize XI{W; Y) + (1 - X)I{W; Z) + 
I{U- Y\W) + I{V; Z\W) - I{U\ V\W). Thus a maximum of 
I{W] Y) + I{W; Z) occurs at e = 0, which is a point strictly 
inside the interval [— ei,e2]- But this can only happen when 
/(W; Y) + I{W] Z) is a constant function of e. Now, taking 
e = — ?! or e = 62 gives us a joint distribution with the 
same values of XI{W] Y) + {1- X)I{W; Z) + /([/; Y\W) + 
I{V- Z\W) - /([/; V\W) and I{W; Y) + I{W; Z), but with 
a smaller support on W. Using this argument, one can reduce 
the cardinahty of to \X\. 

Next, we show how one can reduce the cardinality of U to 
find p{u, v,w,x) such that 

. XI(W; Y) + {l-X)I{W; Z)+I{U: Y\W)+I(y; Z\W)- 
I{U; V\W) is equal to XI{W; ?) + (!- X)I{W; Z) + 
l\U] Y\W) + liy; Z\W) - I{U; VjW), 

. llW; Y) + I{W; Z) is equal to I{W; Y) + l(W; Z), 

. < mind A-l, 13^1), 

• |W| < 

We can repeat a similar procedure to impose the con- 
straint |V| < mindA"!, \Z\). Imposing the extra constraint 
H{X\UVW) = will be discussed at the end. 

If \X\ < \y\, establishing the cardinality bound of \X\ on 
U suffices. This cardinality bound is proved in Theorem 1 of 
jSj. This cardinality bound can be shown using perturbations 
of the type L -.U xW where 

E[L{U,W)\WX] = 0. 

Note that these perturbations preserve the marginal distribution 
of p{'w, x), and thus also I{W; Y) +I{W; Z). The interesting 



case is therefore when \X\ > \y\. Assume that \U\ > \y\. If 
for every w e W, p{u\w) 7^ for at most |3^| elements u, we 
are done, since we can relabel the elements in the range of 
U to ensure that only an alphabet of size at most |3^| is used, 
without affecting any of the mutual information terms in the 
expression of interest. There must therefore exists a function 
L:U xW where 

E[L{U,W)\WY] = 0, 

3(w, w) : Pa{u, w) 7^ 0, L(u, w) ^ 0. 

Let us perturb po{u, v, w, x) along the random variable L -.Ux 
W ^ M. Random variables U, V, W, X, Y, Z are distributed 
according to Pe{u, v, w, x, y, z) defined as follows 

Pe{u,v,w,x,y,z) ^ pQ{u,v,w,x,y,z) ■ [I + eL{u,w)], 

where e is a real number in some interval [— €1,62]. 

The first derivative of XI{W; Y) + {1 - X)I{W; Z) + 
/([/; Y\W) + /(y; Z\W) - /([/; V\W) with respect to e, at 
e = should be zero. Since 

X1{W; r) + (1 - X)I{W- Z) + I{U; Y\W) 
+I{V;Z\W) - I{U]V\W) = 
X{H{W) + H{Y) - H{WY)) 
+(1 - X){H{W) + H{Z) - H{WZ)) + 
H{YW) + H{ZW) - H{UYW) 
-H{VZW) + H{UVW) - H{W), 

we will have: 

X{Hl{W) + Hl{Y) - Hl{WY)) + 
(1 - X){Hl{W) + Hl{Z) - Hl{WZ)) 
+Hl{YW) + Hl{ZW) - Hl{UYW) 
-Hl{VZW) + Hl{UVW) - Hl{W) = 0, 

where Hl{W) denotes Y.w^\-^\^ = Mp{'^)'^og:^ and 
similarly for the other terms. Using Lemma 2 of jS), we have: 

XI{W] Y) + {1- X)I{W- Z) + I{U- Y\W) 
+I{V]Z\W) - I{U;V\W) = 

xi{W] r) + (1 - x)i{w- z) + i{u- Y\w) 

+I{V;Z\W)- I{U]V\W) + 
A( - E[r(e • E[i|iy])] - E[r(e • E[L|y])] 
+E[r{fE[L\WY])]) + 
(1 - A)( - E[r(e • E[L|VK])] - E[r(e • E[L|Z])] 
+E[r{fE[L\WZ])]) + 
-E[r{e ■ E[L\YW])] - E[r(e • E[L|ZH^])] 
+E[r(e • E[L|?7FW"])] + E[r(e • E[L|FVFZ])] 
-E[r(e • E[L|t/FVK])] + E[r(e • E[L|M^])] , 



where r{x) = (1 + a;) log(l + Since E[L(C/, Ty)|T^r] = 0, 
and L is a function of UW, we have: 

XI{W; ?) + (!- X)I{W; Z) + I{U; Y\W) 

+I{V;Z\W) - I{U;V\W) = 
XI{W; Y) + (l- X)I{W; Z) + I{U; Y\W) 
+I{V;Z\W)- I{U;V\W) + 
(1 - A)( - E[r(e • E[L\Z])] + E[r{e ■ E[L\WZ])]) 
-E[r(e • E[L|ZTy])] + E[r(e • E[L|FW'Z])] . 

Since r{x) = (1 + a;) log(l + a;) is a convex function, we have 

-E[r{e-E[L\Z])] + E[r (e ■ E[L\W Z])] > 0, 
-E[r(e • E[L|M/^Z])] + E[r{e ■ E[L\VWZ])] > 0. 

Therefore for any e e [— ei,e2], we have 

XI{W; Y) + {1- X)I{W; Z) + I{U; Y\W) 

+I{V;Z\W) - I{U-,V\W) > 
XliW; r) + (1 - X)I{W; Z) + I{U; Y\W) 
+I{V;Z\W) - I{U;V\W). 

This implies that )J{Wj^) + {l- X)I{W]Z)+I{U;Y\W) + 
I{V;Z\W) - I(U-V\W) js^a^constant function of e. The 
maximum of I{W; Y) + I{W; Z) as a function of e occurs at 
e = 0. Therefore 

Il{W-Y)+1l[W-Z)^^, 

where Il^W; Y) denotes v{u, w, y)L{u, w) log pfj^p^y 

etc. (see Lemma 2 of fSl). 

Using Lemma 2 of |8|, one can observe that [I{W;Y) + 
I(W; Z)\ - \I{W\ Y) + I{W; Z)] equals 

-E[r(e • E[L\Z])] + E[r(e • E[L|W^Z])] > 0. 

But this can only happen when I{W] Y) + I{W\ Z) is a 
constant function of e. Now, taking e = — ei or e = £2 gives us 
auxiliary random variable (J7, W) with smaller support than 
that of ([/, W). We can continue this process as long as there 
exists w E W, such that p{u\w) 7^ for more than |3^| 
elements u. 

It remains to show that one can impose the extra constraint 
H{X\UVW) — 0. Fix p{u,v,w). Consider the expressions 
XI{W; r) + (1 - X)I{W; Z) + /([/; Y\W) + I{V\ Z\W) - 
I{1J]V\W) and I{W]Y) + I{W;Z) as functions of the 
conditional distribution of r{x\u,v,w). We know that for in- 
stance that the former expression is maximized at p{x\u, v, w). 
Further, the extreme points of the corresponding region for 
r(x|u, satisfy r{x\u,v,w) € {0,1}. Both of the expres- 
sions are convex functions of r{x\u,v,w). This is because 
I{W;Y) is convex in the conditional distribution p{y\w); 
similarly I{U;Y\W = w) is convex for any fixed value of 
w. The term I{U ; F |VK) that appears with a negative sign is 
constant since the joint distribution of p{u, v, w) is fixed. 

We can express p{x\u,v,w) as a linear combination of 
the extreme points of the region formed by all conditional 



distributions r{x\u, v, w). Since the maximum of XI{W; Y) + 
(1 - X)I{W;Z) + I{U;Y\W) + I{V;Z\W) - I{U;V\W) 
occurs at some p{x\u,v,w) and the expression is convex in 
r{x\u, V, w), the maximum must also occur at all the extreme 
points showing up in the linear combination. One can use the 
convexity of I{W; Y) + I{W; Z) in r{x\u, v, w) to show that 
the value of I{W; Y) + I{W; Z) at all these extreme points 
must be also equal to that at p{x\u, v, w). 

Appendix B 

In this Appendix we complete the proof of Theorem [T] by 
proving that p{X = x\U = Ui) = p{X = x\U — Ui) for all 
i = 1,2,3, \U\ and x; and similarly p{X — x\V = Vj) = 
p(X = x\V = Vj) for all j = 1, 2, 3, |V| and x. 

Note that 

p{X = x\U = u^) = 
E,P(V = Vj\U = Ui)p{X = x\U = u,,V = V,) = 

E,p(v = v,\u = ^OEfeioP(r«.. = = ^] = 

Y.,p{v = v,\u = - ^)i[CoK,^,) = x] + 

EfciiEjP(1^ = -"jW = u,):;^ak'i-[£,k{ui,Vj) = x]. 



Note that p{V = v,\U = u.) = ^^^^uS"'^ = 



Therefore 



p{X^x\U^u,) = 
Ej pIu=u^) ^\-^^{u^,Vj) = x\ + 

- p(u=u^) Ej iKoK, Vj) ^x\ + 



p{U=u,) T,k=iakT,j ^^kiu^,Vj) ^ x]. 



But since 



M 

E 



the profiles of the i rows must also satisfy the same property: 

M 

^l[^o(wj,Wj) = = = x]. 



k=l 



Therefore, 



piX = x\U = u,) = 

E,^;(ft)iKo(«.,f,) = .T] + o-o = 

Ej p(a=„,) lKo(M»,^'j) = a:] =piX = x\U = Ui). 

The equation p{X — x\V — Vj) ~ p{X ~ x\V = Vj) for 
all j = 1, 2, 3, |V| and x can be proved similarly. 



Appendix C 

Note that 

p{X ^ xo\U ^ u,,T,,, =0) = 
piX = xo\U = Ui,T,j = 0,V ^ Vj)p{V = Vj\U = «,,r,j = 0) 

+ 

p{X = xo\U = u,,r,,j = 0,1/ / t'j)p(V' / t'jlt/ = u,,T,,j = 0). 

Since under the event {U,V) = {ui,Vj) and Ti,j — 0, X is 
equal to xq, the term p(X = xq\U = Ui,Ti,j — 0,V — Vj) 
will be equal to one. Since {U, V) is independent of Tij, we 
have 



p{V = Vj\U = u„T,^^ = 0)=p(V = v^\U = u,), 

p{V ^ Vj\U = u,,T,^^ = 0)=p{V ^ v,\U - u,). 

Lastly p{X — xq\U — Ui,Tij — 0,V vj) is equal to 
p{X = xq\U — Ui,V 7^ Vj) since under the event that {U — 
Ui,V 7^ Vj), X will be independent of Tij (note that T. . 
random variables were mutually independent of each other). 
Therefore, 

p{X^xa\U = u„T,,,=0)= (17) 
p{V^Vj\U = + 
piX = xo\U ^u,,V^ v,)p{y ^ v,\\J = u,). 
Next, note that 
p(X = a^o|f/ = ?i,,T,,, = 1) = 

p(X = xq\U = -ii^.T,,, = 1, V = Vj)p{y = Vj\U = -u,,Tij = 1)+ 
p(X = a;o|;7 = ii>,T,,, = 1, V / t;j)p(V / v,\U = M,,r,,j = 1). 

Since under the event (C/, V^) = {ui,Vj) and j = 1, X is 
equal to xi, the term p{X ~ xq\U ~ Ui,Ti j — 1,V ^ Vj) 
will be equal to zero. Following an argument like above, one 
can show that 



p{X = xo\U = Ui,Ti^j = 1) = 
+ p(X = xq\U = u,,V ^ Vj)p{V ^ v,\U = u,). 



(18) 



Comparing equations (17 1 and (18i, and noting that piV — 

Vj\U = Ui)>Q, we conclude that 

p{X = xo\U = Ui,Ti^j = 0) ^p{X = xo\U = Uj,Tij = 1). 
The proof for 

p{X = xi\U = UijTjj = 0) 7^_p(X = xi\U = Ui,Ti^j = 1) 
is similar. 

It remains to show that for any x ^ {xq,xi], 
p{X = x\U = u„T,^j = 0)^p{X - x\U = u„T,,, 1). 
Note that 

p{X ^ x\U ^ u„T,,j = 1) ^ 

p{X = x\U = Mi,T,j = 1,1/ = Vj)p{V = i;,|l7 = M,,r,j = 1)+ 
p(X = x|l7 = w,,T,,,^ = 1,V 7^v,)p{V 7^v,\U = M,,r,j = 1) = 
0+p{X ^ x\U = u^,V^ Vj)p{V / Vj\U = = 
p(X = ijU" = it,,T,,, = 0). 



Appendix D 

We prove the statement by contradiction. Assume that 

p{Y = y\U = u,,T,^, - 0)^p{Y = y\U = u,,T,,, = 1). 
We have 
p(y = y|;7 = M„T,j =0) = 

p{Y = y\U = Ui,Tij =0,X ^ xo)p{X ^ xn\U = Ui,Ti^j = 
p(Y y\U = Uj,Tij =0,X ^ xi)p{X = Ui,Ti^j = 

^ (p(? = ?/|C/ = ".,T,,j=0,X = x)x 

A'.x^ {2:0 } 

p(l = x|C/ = u,,T,,, =0)) = 
p(y = y\X - a:o)p(^ = a;o|C/ = u„T,^j = 0)+ 
p(y = y\X = = a;i|C/ = u„ T,,, = 0)+ 

J2 (piY = y\X = = = = 0)). 

a;A:',a:^{a;o:a;i} 

Similarly, 

piY = y\U = u,,T,^j=l) = 

p{Y = y\X = xo)piX = xo\U = u„T,,, = 1)+ 

p{Y = y\X = xi)p{X = xi\U = u„T,^j = 1)+ 

^ {p{Y = y\X = x)p{X = x\U = u,,T,^j = 1)). 

It was shown in Appendix [C] that 

p{X = xq\U = Ui,Ti^j = 0) ^piX = xa\U = Ui,Ti,j = 1), 

p{X^xi\U^ u,,T,^j =Q)^p[X ^xi\U ^ u„T,^j =1). 

But for any x ^ {2:0,2:1}, 

PiX = x\U = u„T,^, = 0) = (19) 
p{X ^x\U = u„T,,j =1). 

Thus, we must have 

p{Y = y\X = xq)p{X = xolL/ = m„T,., - 0) + 
piY = y\X = xi)piX = xi\U = u„T,^, - 0) = 
y\X^^ xo)p{X^ xo\U = u,,T,,j = 1) + 
p{Y = y\X = xi)p{X = xi\U = u„r,,j = 1). 

This implies that 

p(X=xo\U=Ui,Tij = l)-p{X=xo\U=Ui,Tij=0) _ p(Y=y\X=xi) 
p(X=xi\U=Ui,Tij=a)-p{X=xi\U=u,,Tij = l) ~ p(Y=y\X=xay 

Note that the nominator and denominator are positive by what 
was proved in Appendix |C] 

On the other hand, we also have by equation ( [T9| : 

p{X^xa\U ^u,,T,,,=^) + 
p{X = xi\U = Ui,Tij = 0) = 
p{X = xa\U = Ui,Ti^j = 1) + 
p{X = xi\U = Ui^Tij = 1). 



This implies that 

p{X=xo\U=Ui,Tij = l)-p(X=xo\U=Ui,Ti,j=0) _ ^ 
p{X=xi\U=Ui,Tij=Q)-p{X=xi\U=Ui,Ti,j = l) ~ 

Hence, 

p(Y=y\X=xi) ^ -, 
p{Y=y\X=xo) 

But we know that p{Y = y\X = xq) 7^ p{Y ~ y\X ~ xi) 
since the input values xq and xi are distinguishable by the Y 
receiver. This is a contradiction. 

Appendix E 

The proof follows from the following two statements: 

Statement 1: Assume that p* {u,v,w,x) is an arbitrary 
joint distribution maximizing XI{W; Y) + {1 - X)I(W; Z) + 
I{U-Y\W)+I{V; Z\W)-I{U; V\W), and having the largest 
value of I{W\ Y) + I{W] Z) among all maximizing joint 
distributions. For every w, p*{x\w) must belong to the set 
T{q{y,z\x)) defined as follows. Let T{q{y;z\x)) be the set 
of pmfs on X, t{x), such that 

max {\I{W;Y) + il-X)I{W;Z) 

p{u.v,w\x)t{x)q{y.z\x) 

+ I{U; Y\W) + I{V; Z\W) - I{U; V\W)} 

= max (I{U;Y)+IiV;Z)~IiU;V)), 

p{u.v\x)t[x)q(y,z\x) 

and I{W;Y) = I{W;Z) = for an:^pmf p{u,v,w\x)t{x) 
that maximizes the expression XI{W; Y) + {1 — X)I{W; Z) + 
/([/; Y\W) + I{V- Z\W) - I{U- V\W)^ 

Statement 2: Let q(ij, z\x) be a general broadcast channel, 
and t{x) E T{q(y, z\x)). Consider the maximization prob- 
lem: Tna.yip(u,v\x)t(x)q(y,z\x) 

{I{U;Y) + I{V;Z) ^ I{U:V)). 
Assume that a maximum occurs at p*{u,v\x). Then the 
following holds for random variables {U,V,X,Y,Z) ^ 
p* {u, v\x)t{x)q{y, z\x): 

. I(U-, Y) > I{U-, VZ) for every I^^ U ^ VXYZ. 
• /(F; Z) > I{V; UY) for every V ^ UXYZ. 

A. Proof of Statement 1: 

Assume that the marginal pmf of X given W — w does not 
belong to T for some w. By the definition then, at least one 
of the following must hold: 

Case 1: Corresponding to P5f|w=ii)('^) conditional 
distribution p{u,v,w\x) such that 

I{U; Y\W = iv) + I{V; Z\W ^ w) ~ I{U; V\W ^ w) < 
\l(W; Y) + {1- \)I{W\ Z) + I{U\ Y\W) 

+ I{V]Z\W) - I{U]V\W) (20) 

where p{u, v, w, x, y, z) = v, w\x)p*x^y^^^{x)q{y, z\x). 

^Note that such a pmf may not unique. 

'We have used maximum and not supremum in the above conditions since 
cardinality bounds on the auxiliary random variables exist |8J. 



Case 2: Corresponding to P*x\];y^^{x) is the conditional 
distribution p{u,v,w\x) such that 

I{U; Y\W ^w)+ I{V; Z\W = w) - I{U; V\W ^ w) = 
XI{W; Y) + {1- X)I{W; Z) + 10; Y\W) 
+ I{V;Z\W) - I{U;V\W) 

but I{W; Y) + I{W; Z) > 0, where p{u, v, w, x, y, z) = 
p{u, V, w\x)p*x^yy^^{x)q{y, z\x). 

Define U, V, jointly distributed with U, V, W, X, Y, Z 
as follows: whenever W w, the random variables U — U, 
V ^V, W = W.Fov W = w, the Markov chain UVW 
X — > UVWYZ holds, and p{u, v, w\x) = p{u, v, w\x). Next, 
assume that U' = U,V' ^ V, W = WW. 

If case 1 holds, we prove that XI{W']Y) + (1 - 
X)I(W'] Z) + /([/'; Y\W') + I{V'; Z\W') - I{U'; V'\W') > 
XliW] y) + (1 - X)IiW; Z) + /([/; Y\W) + /(F; Z\W) - 
I{U; V\W), which results in a contradiction. If case 2 holds, 
we prove that XI{W'; Y) + {1~ X)I{W'; Z)+I{U'; Y\W') + 
I{V'- Z\W')-I{U'- V'\W') = XI{W; Y) + {l-X)I{W; Z) + 
I{U; Y\W) + I{V; Z\W) - /([/; V\W) but that I{W'; Y) + 
7(W'; Z) > I(W; Y) + I{W; Z), which results in a contra- 
diction. 

Assume that case 1 holds. Since W' = WW, I{W'-, Y) = 
I{W]Y) + I{W;Y\W) and I{W';Z) = I{W;Z) + 
I(W; Z\W), we need to show that 

XI(W; Y\W) + (1 - X)I{W] Z\W) + /([/; Y\WW) + 
I{V: Z\WW) - I{U; V\WW) > 
I{U; Y\W) + I{V; Z\W) - /([/; V\W) 

Remember that whenever W ^ w, random variables [/, V , 
W were defined to be equal to J7, V , W . Therefore we need 
to show that 

p{W = w) [XI{W] Y\W = u;) + (1 - X)I{W] Z\W = u;) + 
I{U; Y\W ^w,W)+ I{V] Z\W = w, W) 
-I{U;V\W ^ w,W)\ > 
p{W = w) [I{U; Y\W = w) + I{V; Z\W = w) 
-I{U;V\W = w)]. 

On the event W = w, random variables U, V, W were defined 
so that p{u,v,w\x) is equal to p{u,v,w\x). Furthermore the 
marginal distribution of p{x) is p*(a;|VK = w). Therefore 
I{W;Y\W = w)^ I(W;Y), I{W;Z\W = w) ^ I{W;Z), 
llU]Y\W ^ w,W) = I{U\Y\W), etc. Thus it remains to 
show that 

xi{W] r) + (1 - x)i{w- z) + i{u- Y\w) 

+I{V;Z\W) - I{U]V\W) > 
I{U; Y\W = w)+ I{V; Z\W = w) - I{U; V\W = w). 

This holds because of equation ( [20| . This concludes the proof 
for case I. 



Now, assume that case 2 holds. Following, the above proof 
for case 1, one can get 

XI{W'; y) + (1 - X)I{W'] Z) + I{U'; Y\W') 
+I{V'; Z\W') - I{U'] V'\W') > 
XI{W; Y) + {1- X)I{W; Z) + I{U; Y\W) 
+I{V;Z\W) - I{U;V\W). 

Note that I{W';Y) + I{W'; Z) = I{W\ Y) + I{W\ Y\W) + 
I{W]Z) + I{W^\W). Thus, we need to show that 
I{W] Y\W) + I{W; Z\W) > 0. Note that 

I(W;Y\W) + I(W;Z\W) = 
p(W = w){l(W;Y\W = w) + I{W;Z\W = w)) 
= p{W = w) {I{W; Y) + I{W; Z)) > 0. 

B. Proof of Statement 2: 

Take an arbiti-ary U satisfying U ^ U ^ VXYZ. Let 
W = U, U ^ U, V ^ V. Since t{x) e T(q(y,z\x)), and 
p* {u, v\x) maximizes I{U ; Y) + I{V; Z) — I{U ; V), we can 
write: 

IiU;Y) + I{V;Z)-I{U;V) > 

XliW; Y) + {1- X)I{W; Z) + /([/; Y\W) + I{V] Z\W) 
-I[U-V\W). (21) 

and furthermore if equality holds, we must have I{W; Y) — 
I{W] Z) = 0. We prove that this implies that /([7; Y) > 
I{U-VZ). 
We can write: 

I{U-Y) + I{V-Z)-I[U-V) > 
XI(W; Y) + {1- X)I{W; Z) + /(j/; y [M?) + /(F; 
-I{U-V\W) = 
XI{U; Y) + {1- X)IiU: Z) + /([/; Y\TJ) 
+I{V-Z\U)-I{U;V\U). 

Since f7 ^ [/_-> VXYZ, we have /([/; F) = I(UU] Y) and 
/(f7; F) = I(UU; V). This impHes that 

I{U-Y)+I{V;Z)-I{U-V) > 
XliU; r) + (1 - X)I(U; Z) + /(F; Z|C7) 



A = 1: 

i{y-u\z) 



In this case, equation ( 22 1 implies that 



0. Furthermore equation (21 



will hold 



or. 



with equality. Since t{x) e T, we must have I{U ; Y) = 
I(U; Z) = 0. _ _ _ 

The fact that I_{y;U\Z) ^_I{U;Y) = 1(0; Z) = 
implies that l(U; Y) = l(U; ZV) = 0. Therefore the 
inequality I{U\ Y) > I{U ; ZV) also holds in this case. 

In each case, we are done. The proof for the inequality 

I(V; Z) > J(F; YU) is similar. 



/([/; Y) + I{V; Z) > XI{U- Y) + {1 ~ X)I{U; Z) + I{V; ZU) 

or, 

(1 - X)I{U- Y)>(1- X)I{U; Z) + 1{V- U\Z). 
In other words 

{l-X)I{U;Y) > {1 - X) I (U;VZ) + XI {V;U\Z). (22) 
Let us consider the following two cases: 



A < 1: In this case, equation (22 1 implies that I{U ; Y) > 
I(JJ\VZ) + j^I{V;U\Z). This inequality implies the 
desired inequality J(l7; Y) > I{U; VZ). 



